docs for graphs that depend on assets #12597

sryza · 2023-02-28T18:41:18Z

Summary & Motivation

Motivated by this feedback: https://dagster.slack.com/archives/C01U5LFUZJS/p1677548018200809

How I Tested These Changes

vercel · 2023-02-28T18:41:24Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated
dagster	✅ Ready (Inspect)	Visit Preview	💬 Add your feedback	Feb 28, 2023 at 11:05PM (UTC)

1 Ignored Deployment

Name	Status	Preview	Comments	Updated
dagit-storybook	⬜️ Ignored (Inspect)			Feb 28, 2023 at 11:05PM (UTC)

erinkcochran87

Left a few comments, but looks good!

docs/content/concepts/ops-jobs-graphs/graphs.mdx

docs/content/guides/dagster/how-assets-relate-to-ops-and-graphs.mdx

spenczar · 2023-02-28T20:57:53Z

docs/content/concepts/ops-jobs-graphs/graphs.mdx

@@ -262,6 +263,36 @@ Note that in most cases, it is usually possible to pass some data dependency. In

 Dagster also provides more advanced abstractions to handle dependencies and IO. If you find that you are finding it difficult to model data dependencies when using external storage, check out [IO managers](/concepts/io-management/io-managers).

+### Loading an asset as an input
+
+You can supply an asset as an input to one of the ops in a graph. Dagster can then use the [IO manager](/concepts/io-management/io-managers) on the asset to load the input value for the op.


How do I override the IOManager used by the asset? I can do that with @asset(Ins={"key": AssetIn(..., input_manager_key: "overriding_io_mgr")}) for assets, how do I do it with ops?

Here are some docs: https://docs.dagster.io/concepts/io-management/unconnected-inputs#providing-an-input-manager-for-an-unconnected-input.

Oh, I didn't put together that this is an "unconnected input" but I guess that makes sense, OK.

spenczar · 2023-02-28T20:59:00Z

docs/content/concepts/ops-jobs-graphs/graphs.mdx

+If the asset is partitioned, then:
+
+- If the job is partitioned, the corresponding partition of the asset will be loaded.
+- If the job is not partitioned, then all partitions of the asset will be loaded.


What does "all partitions will be loaded" mean for the shape of the value? Is it a list, or a dictionary, or a generator, or something else? I'm wondering how I ought to write my op to handle that.

The type depends on the I/O manager implementation:

The Pandas and PySpark type handlers of the DB IO managers (Snowflake, DuckDB, BigQuery) always return a single DataFrame, which can includes values from all the partitions.

When loading an input that corresponds to multiple partitions, the UPathIOManager returns a dictionary that maps each input partition key to the input value for that partition key.

This needs better docs, but I don't think this is the right place to put them.

sryza requested a review from erinkcochran87 as a code owner February 28, 2023 18:41

vercel bot deployed to Preview – dagster February 28, 2023 18:42 View deployment

erinkcochran87 reviewed Feb 28, 2023

View reviewed changes

docs/content/concepts/ops-jobs-graphs/graphs.mdx Outdated Show resolved Hide resolved

docs/content/concepts/ops-jobs-graphs/graphs.mdx Outdated Show resolved Hide resolved

docs/content/guides/dagster/how-assets-relate-to-ops-and-graphs.mdx Outdated Show resolved Hide resolved

spenczar reviewed Feb 28, 2023

View reviewed changes

sryza force-pushed the source-asset-graph-input-docs branch from 2f0f0b4 to a01726c Compare February 28, 2023 22:15

sryza requested a review from erinkcochran87 February 28, 2023 22:16

vercel bot deployed to Preview – dagster February 28, 2023 22:17 View deployment

docs for graphs that depend on assets

016d205

sryza force-pushed the source-asset-graph-input-docs branch from a01726c to 016d205 Compare February 28, 2023 23:03

vercel bot deployed to Preview – dagster February 28, 2023 23:05 View deployment

erinkcochran87 approved these changes Mar 1, 2023

View reviewed changes

sryza merged commit dab3831 into master Mar 1, 2023

sryza deleted the source-asset-graph-input-docs branch March 1, 2023 16:18

louis-jaris mentioned this pull request May 14, 2024

Partitioned jobs with partitioned source assets as input #13357

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs for graphs that depend on assets #12597

docs for graphs that depend on assets #12597

sryza commented Feb 28, 2023

vercel bot commented Feb 28, 2023 •

edited

Loading

erinkcochran87 left a comment

spenczar Feb 28, 2023

sryza Feb 28, 2023

spenczar Feb 28, 2023

spenczar Feb 28, 2023

sryza Feb 28, 2023

docs for graphs that depend on assets #12597

docs for graphs that depend on assets #12597

Conversation

sryza commented Feb 28, 2023

Summary & Motivation

How I Tested These Changes

vercel bot commented Feb 28, 2023 • edited Loading

erinkcochran87 left a comment

Choose a reason for hiding this comment

spenczar Feb 28, 2023

Choose a reason for hiding this comment

sryza Feb 28, 2023

Choose a reason for hiding this comment

spenczar Feb 28, 2023

Choose a reason for hiding this comment

spenczar Feb 28, 2023

Choose a reason for hiding this comment

sryza Feb 28, 2023

Choose a reason for hiding this comment

vercel bot commented Feb 28, 2023 •

edited

Loading